In this session we will cover a few topics on data visualization.
1. We are going to reproduce a protein-protein interaction network I’ve made for a paper published recently. For that, we are going to use Cytoscape.
2. We will explore different types of variables and plot them using the R language environment. More specifically we will use RStudio Cloud and a package called ggplot2.
In deLomana et al. [2020], we studied the aspects of the translational regulation of Halobacterium salinarum. One of the questions we needed to answer to support one of our hypothesis was:
We know that transcription and translation happen in different compartments in the eukaryotic cell: Transcription happens inside the nuclei. Translation happens in the cytoplasm.
In the prokaryotic cell, we don’t have a membrane to separate the genetic material from the cytoplasm, so transcription and translation are likely to happen simultaneously. Coupled transcription and translation is a fact in Bacteria. But what about Archaea?
Source: https://www.mun.ca/biology/scarr/iGen3_05-09.html
In the Baliga Lab, a few years ago, Mark Facciotti and his colleagues performed a target coimmunoprecipitation experiment to find out what were the proteins coupled to specific proteins of the transcription machinery, the general transcription factors of H. salinarum. [Facciotti et al. 2007].
Using those results, we checked if proteins of the translational machinery were present in the pulldown fractions of transcription proteins. Indeed, we were able to find a few, supporting our hypothesis of coupled transcription and translation:
RPs physically interact with transcription complex components. Diamonds represent RPs; squares represent transcription complex components. Tagged proteins used as bait in the immunoprecipitation experiment are highlighted by a black border. Arrowheads link bait to coimmunoprecipitated proteins. We labeled each of the seven modules obtained by the Newman-Girvan clustering algorithm using a different color [de Lomana et al., 2020].
This is the structure of a Simple Interaction File (SIF)
1. Go to the Cytoscape download page. Download and install it.
2. Open Cytoscape program and install the following apps:
Apps -> App Manager -> Search: clusterMaker2 -> Select listing -> Click on Install buttonApps -> App Manager -> Search: Color Cast -> Select listing -> Click on Install buttonApps -> App Manager -> Search: yFiles -> Select listing -> Click on Install button1. Import the protein-protein interaction file ppi.sif.
File -> Import -> Network from URL ->
https://alanlorenzetti.github.io/dataVisSession2020/data/ppi.sif
2. Import the protein information table.
File -> Import -> Table from URL -> https://alanlorenzetti.github.io/dataVisSession2020/data/ppiFunCat.tsv ->
Where to Import Table Data: To selected networks only -> Click on OK button
3. Click on Style tab. Let’s change the design of our network.
Sample1 preset style.label column.class of proteins. Rectangles will represent Transcription class. Diamonds will represent Translation class.2.35.Edge tab. Make the edges thicker (size 1.5) and black.Target Arrow Shape; Delta) to the end of edges.4. Apply the Newman-Girvan modularity algorithm to find modules of highly interconnected proteins. Use the default parameters. Apps -> clusterMaker -> Community Cluster (GLay)
5. Change the color of nodes according to the modules. We will use a plugin called Color Cast to make our lives easier. Select __glayCluster as the target data column. We’ll apply the Set2 colors. Tools -> Color Cast -> Color Cast
6. Remove all the nodes not classified as Transcription or Translation.
Node Table panel, order the class column and select all proteins of those classes.Select nodes from selected rows.Select -> Nodes -> Hide Unselected Nodes.7. Apply an automatic layout.
yFiles Hierarchic Layout Selected Nodes. Layouts -> Hierarchic Layout Selected Nodes.8. Save your network and export as an image file.
File -> SaveFile -> Export -> Network to Imageggplot2 packageggplot2 packageinstall.packages("ggplot2")
# loading ggplot2
library(ggplot2)
theme_set(theme_bw())
# loading our dataframe
haloExp = read.delim("https://alanlorenzetti.github.io/dataVisSession2020/data/haloExpression.tsv")
Counts:
Measures:
Ratios:
What kind of plot is suitable to my data? We have to think about the i) number of variables, ii) the type of variables, and iii) the goal of the visualization.
ggplot(data = haloExp, mapping = aes(x = length)) +
geom_histogram()
ggplot(data = haloExp, mapping = aes(x = length)) +
geom_density()
ggplot(data = haloExp, mapping = aes(x = biological_class)) +
geom_bar() +
coord_flip()
ggplot(data = haloExp, mapping = aes(x = mRNA_expression, y = protein_expression)) +
geom_point()
ggplot(data = haloExp, mapping = aes(x = mRNA_expression, y = protein_expression, color = GC)) +
geom_point()
ggplot(data = haloExp, mapping = aes(y = GC, x = biological_class)) +
geom_boxplot() +
coord_flip()
DE LOMANA, A. et al. Selective Translation of Low Abundance and Upregulated Transcripts in Halobacterium salinarum. mSystems, v. 5, n. 4, 28 jul. 2020.
FACCIOTTI, M. T. et al. General transcription factor specified global gene regulation in archaea. Proceedings of the National Academy of Sciences of the United States of America, v. 104, n. 11, p. 4630–4635, 13 mar. 2007.
MORAN, M. A. et al. Sizing up metatranscriptomics. The ISME Journal, v. 7, n. 2, p. 237–243, fev. 2013.